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RESEARCH  OBJECTIVES 


It  is  now  generally  recognized  that  to  achieve  the  computation 
speeds  necessary  to  solve  the  new  complex  scientific  problems  of 
significant  impact  to  the  DOD  community  requires  radical 
reorganization  of  traditional  algorithms  in  numerical  linear 
algebra.  It  is  not  sufficient  to  implement  old  algorithms  in  a 
parallel  processing  environment.  New  fast  algorithms  for  the 
next  generation  of  supercomputers,  currently  being  previewed  by, 
e.g.,  the  Cray  X-MP,  the  Denelcor  HEP  and  the  Intel  Hypercube, 
are  essential.  In  order  to  meet  the  challenges  of  this  emerging 
new  generation  of  machines,  it  is  the  goal  of  this  project  to 
develop  techniques  in  numerical  linear  algebra  for  efficient 
implementation  on  advanced  multiprocessors.  Significantly, 
applications  of  our  work  to  the  practical  real  world  problems  of 
structural  optimization  and  least  squares  estimation  methods  are 
being  developed. 

In  the  area  of  structural  optimization  we  have  been  concerned 
with  the  fundamental  problem  of  linear  elastic  analysis  -  that 
of  finding  the  stresses  and  strains,  given  a  finite  element 
model  of  a  complex  structure  and  a  set  of  external  loads.  To 
obtain  the  solution  of  this  constrained  minimization  problem  a 
variety  of  algorithms  such  as  the  displacement  method,  the  force 
method,  the  natural  factor  method  and  the  weighted  least  squares 
method  can  be  applied.  While  the  advantages  of  implementing  one 
of  these  methods  over  another  on  serial  computers  have  been 
widely  studied,  the  effects  of  parallelism  in  performing  the 
linear  algebra  computations  have  not. 

Over  the  past  year  we  have  written  several  reports  on  our 
studies  of  fast  algorithms  for  large-scale  structural 
optimization  problems,  including  comprehensive  papers  to  appear 
in  the  SIAM  Journal  on  Algebraic  and  Discrete  Methods.  Linear 
Algebra  and  Its  Applications  and  Numerische  Mathematik .  Our 
goals  over  the  next  2-5  years  continue  to  be  the  development  and 
testing  of  complete  sparse  matrix  finite  element  structural 
optimization  packages  on  machines  such  as  the  Cray  X-MP  with  up 
to  4  processors  and  its  comparison  with  traditional  serial 
methods  in  packages  such  as  NASTRAN.  We  also  intend  to  test 
highly  parallel  segments  of  these  algorithms  on  a  Hypercube  with 
up  to  128  processors,  in  order  to  obtain  a  better  understanding 
of  the  possible  ramifications  of  applying  massive  parallelism  to 
structural  analysis  computations. 

The  principal  thrusts  in  our  work  on  least  squares  methods  and 
applications  to  constrained  minimization,  to  linear  estimation 
and  to  geodetic  adjustment,  have  been  on  developing  techniques 
for  the  solution  of  superlarge  problems  in  a  stable  way,  i.e., 
employing  orthogonal  factorization  techniques,  on  multi¬ 
processors.  A  paper  on  a  parallel  block  iterative  scheme 
developed  for  these  purposes  will  appear  in  Linear  Algebra  and 
Its  Applications.  Another  approach  using  domain  decomposition 
techniqt.  - ^  is  part  of  some  joint  work  between  our  project  and 


1  11 


one  by  A.  Sameh  at  the  University  of  Illinois.  A  paper  on  our 
results  will  be  presented  at  the  forthcoming  SIAM  Conference  on 
Parallel  Processing  in  Norfolk,  Virginia.  Our  short  term  goals 
in  this  area  include  testing  of  these  schemes  using  large-scale 
practical  least  squares  data  on  the  Cray  X-MP  at  the  BSC  this 
fall,  to  be  followed  by  implementation  on  the  Cedar  machine 
being  constructed  at  the  University  of  Illinois.  Our  longer 
range  goals  are  to  investigate  stable  methods  for  least  squares 
computations  which  involve  various  levels  of  parallelism  inc¬ 
luding  domain  decomposition  as  well  as  pipelining  type  schemes 
for  orthogonal  reduction. 

Thus  there  are  two  areas  of  particular  excitement  in  this 
project.  We  are  close  to  the  establishment  of  a  framework  for 

testing  and  comparing  parallel  algorithms  for  structural 
optimization  against  the  more  traditional  approaches  in 
commercial  software.  In  addition,  we  are  developing  new  tools 
for  large-scale  least  squares  computations  necessary  to  meet  the 
challenges  of  solving  data-massive  problems  in  a  stable  way  on 
the  new  generation  of  supercomputers. 

Some  major  results  obtained  during  the  past  year  of  this  project 
are  outlined  in  the  next  section. 

II.  SUMMARY  OF  MAJOR  RESULTS 

The  most  important  reserach  accomplishments  by  the  principal 
investigator  and  co-workers  are  described  below.  These  results 
have  been  obtained  on  four  specific  problems  in  computational 
mathematics  and  applications. 

1 .  Parallel  Algorithms  for  Finite  Element  Structural  Analysis 
on  the  HEP  Multiprocessor . 

A  fundamental  problem  of  linear  elastic  analysis  is  that  of 
finding  the  vectors  of  stresses  and  strains,  given  a  finite 
element  model  of  a  structure  and  a  set  of  external  applied 
loads.  To  obtain  the  solution  to  this  linearly  constrained 
quadratic  minimization  problem  a  variety  of  methods 
involving  the  force  method  or  the  displacemment  method  may 
be  used.  The  primary  consideration  in  using  the  force 
method  is  the  computation  of  a  basis  matrix  Z  for  the  null 
space  of  the  equilibrium  matrix  E  for  the  structure,  whereas 
the  primary  consideration  in  using  the  displacement  method 
is  the  solution  of  the  stiffness  equations  with  coefficient 
matrix  K  =  EkET  where  k  is  the  element  flexibility  matrix. 

One  purpose  of  this  paper  is  to  report  on  the  design  and 
testing  of  a  parallel  version  of  a  highly  effective  but 
computation  intensive  algorithm  for  computing  a  banded  basis 
matrix  Z  for  the  null  space  associated  with  the  force 
method.  FORTRAN  codes  for  both  the  serial  and  parallel 
versions  of  the  algorithm  have  been  implemented  on  the 
Denelcor  HEP  computer  at  the  Argonne  National  Laboratory. 


— - v  .*  .  «  .  • 


Performance  results  using  data  from  some  practical 
structural  problems  indicates  that  the  parallel  version 
executes  significantly  faster  on  the  HEP  than  the  serial 
version,  which  has  also  been  run  on  a  CYBER  205  and  a  Cray 
IS.  Secondly,  a  preliminary  report  on  the  use  of  a 
pipelined  Givens  orthogonal  factorization  scheme  in 
conjunction  with  a  weighted  least  squares  approach  in 
avoiding  the  often  ill-conditioned  stiffness  equations  in 
the  displacement  method  is  given.  The  scheme  is  based  upon 
a  method  recently  proposed  by  Dongarra,  Sameh  and  Sorensen. 
The  effectiveness  of  this  orthogonal  factorization  scheme  on 
structural  analysis  problems  is  currently  being  investigated 
on  the  HEP. 

Convergent  Iterations  for  Computing  Stationary  Distributions 
of  Markov  Chains 


Classical  iterative  schemes  such  as  the  Gauss-Seidel  method 
and  its  variations  constitute  powerful  tools  for  ocmputing 
stationary  distribution  vectors  for  large-scale  Markov 
process,  such  as  those  arising  in  queueing  network  analysis. 
The  coefficient  matrix  A  in  these  processes  in  a  Q-matrix, 
i.e.,  a  singular  irreducible  M-matrix  with  zero  column  sums 
and,  unlike  the  nonsingular  case,  the  classical  iterations 
for  A  do  not  always  converge.  The  purpose  of  this  paper  is 
to  survey  the  recent  literature  and  to  analyze  the  behavior 
of  these  methods  completely  in  terms  of  the  graph  structure 
of  A.  The  results  given  here  hold  under  somewhat  weaker 
assumptions  or  A. 

Updating  LU  Factorizations  for  Computing  Stationary 
Distributions 


The  computation  of  stationary  probability  distributions  for 
Markov  chains  is  important  in  the  analysis  of  many  models  in 
the  mathematical  sciences,  such  a  queueing  network  models, 
input-output  economic  models  and  compartmental  tracer 
analysis  models.  These  computations  often  involve  the 
solution  of  large-scale  homogeneous  linear  equations  by 
Gaussian  elimination,  where  A  is  a  Q-matrix,  i.e.,  A  =  (a,.,) 
is  irreducible,  a. .  £  0  for  all  i  ^  j  and  has  zero  column 
sums.  The  stationary  distributions  are  the  components  of 
the  unique  solution  vector  x  of  positive  components  whose 
sum  is  one.  Stable  direct  methods  for  computing  x  by 
triangular  factorization  A  =  LU  have  received  considerable 
attention  recently  and  the  purpose  of  this  paper  is  to 

e  a  stable  method  for  updating  the  factors  L  and  U  in 
flops  in  the  case  where  a  column  of  A  is  modified. 
Updating  formulas  are  derived  here  using  an  approach  similar 
to  that  for  updating  the  Cholesky  factor  of  a  symmetric 
positive  definite  matrix  after  the  addition  of  a  rank  one 
matrix.  The  algorithm  is  effective  more  generally  for  any 
matrix  that  has  a  stable  LU  factorization  and  for  which  the 
updated  matrix  has  a  stable  LU  factorization.  An  error 
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analysis  for  the  LU  update  algorithm  is  outlined  along  the 
lines  of  that  given  for  the  Cholesky  update  by  Fletcher  and 
Powell.  Details  of  the  algorithm  based  on  the  error 
analysis  and  other  considerations  are  given. 

4 .  A  Parallel  Block  Iterative  Scheme  Applied  to  Computations  in 
Structural  Analysis 

In  this  paper  it  is  shown  how  a  block  cyclic  successive 
over-relaxation  direct-iterative  method  can  be  applied  to 
the  parallel  solution  of  certain  large-scale  linear 
equality-constrained  quadratic  programming  problems.  The 
scheme  is  similar  in  nature  to  those  studied  recently  by 
de  Pillis,  Niethammer  and  Varga  and  by  Markham,  Neumann  and 
Plemmons  for  solving  large  sparse  least  squares  problems. 

It  is  based  upon  a  partitioning  strategy  of  the  fundamental 
matrix  into  a  block  consistently  ordered  2-cyclic  form  where 
the  nonzero  eigenvalues  of  the  Jacobi  matrix  are  all  pure 
imaginary.  The  method  is  shown  to  be  globally  convergent 
and  convergence  rates  are  established. 

Applications  of  the  algorithm  are  discussed  for  large-scale 
structural  analysis  computations  where  it  is  shown  how  the 
algorithm  can  be  adapted  to  the  simultaneous  computation  of 
the  system  forces  and  the  nodal  displacements.  Here, 
advantage  can  be  taken  of  the  special  forms  of  the  matrices 
involved.  In  particular,  it  is  shown  that  much  of  the 
algorithm  lends  itself  to  efficient  implementation  on 
pipelined  vector  machines  and  on  multiprocessors. 

III.  RESEARCH  IN  PROGRESS 

The  research  projects  in  support  of  this  grant  which  are 
currently  underway  are  briefly  described  below.  A  more  complete 
description  of  the  results  of  this  current  work  will  be  given  in 
next  year's  annual  scientific  report. 

1 .  Parallel  Version  of  the  Golub/Plemmons  Algorithm  for  the 
Orthogonal  Factorization  of  Least  Squares  Observation 
Matrices  in  Block  Angular  Form. 


The  Golub/Plemmons  block  orthogonal  factorization  algorithm 
was  designed  for  the  stable  least  squares  adjustment  of 
large  amounts  of  geodetic  data  which  is  assembled  so  that 
the  observation  matrix  has  a  bordered  block  diagonal  form. 
The  algorithm  can  also  be  applied  to  more  general 
substructuring  or  domain  decomposition  methods.  In  this 
work  we  are  investigating  how  a  parallel  version  of  the 
algorithm  can  be  applied  not  only  to  the  factorization 
scheme  but  also  to  the  back  substitution  phase  of  the  least 
squares  method.  The  algorithm  has  been  programmed  for 
testing  on  the  Denelcor  HEP  multiprocessing  computer  at  the 
Argonne  National  Laboratory  and  will  be  implemented  later  on 
the  Cedar  multiprocessor.  This  is  joint  work  with  A.  Sameh 
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Multiprocessor . 

This  research  is  concerned  with  parallel  algorithms  for  the 
iterative  solution  of  systems  of  linear  equations  Ax  =  b, 
where  A  is  nonsingular.  Our  attention  on  this  project  is 
restricted  to  the  implementation  on  a  64  processor  Intel 
Hypercube  of  a  multi-splitting  type  algorithm.  Here, 
overlapping  sets  of  rows  of  A  are  assigned  to  seperate 
processors  which  in  turn  perform  seperate  iterations.  Using 
the  message  passing  communication  scheme  between  processors, 
the  iterations  are  weighted  in  such  a  way  that  convergence 
results  whenever  A  is  either  symmetric  positive  definite  or 
an  M-matrix.  The  communication  complexity  of  our  algorithm 
is  O(21og2p)  where  p  is  the  number  of  processors  being  used. 
The  algorithm  has  previously  been  implemented  by  R.  White  on 
the  shared  memory  Denelcor  HEP  system  at  Argonne.  This  is 
current  work  with  R.  Funder lie  at  the  Oak  Ridge  National 
Laboratory. 

Finite  Element  Calculations  for  Structural  Analysis  on  a 
Hypercube  Multiprocessor . 


A  new  approach  to  calculating  the  vector  f  of  internal 
forces  and  the  associated  stresses,  given  a  finite  element 
model  of  a  structure  and  an  external  load  vector  p,  is  being 
developed  for  implementation  on  a  hypercube  multiprocessing 
system.  This  joint  work  with  M.  T.  Heath  at  the  Oak  Ridge 
National  Laboratory  is  based  upon  solving  the  fundamental 
problem  of  linear  elastic  analysis 

T 

min  f  Ff  subject  to  Ef  =  p, 

where  F  denotes  the  element  flexibility  matrix  and  E  the 
equilibrium  matrix,  by  the  weighting  method  for  linearly 
constrained  least  squares,  thus  avoiding  the  often 
ill-conditioned  stiffness  equations.  The  particular 
formulation  of  this  problem,  based  upon  earlier  work  of 
Berry  and  Plemmons  for  the  Denelcor  HEP  shared  memory 
system,  lends  itself  to  the  solution  on  a  hypercube 
configured  into  a  ring  type  topology,  with  a  central  host 
node.  A  significant  advantage  of  our  scheme  will  be  that 
the  matrix  factors  are  distributed  throughout  the 
processors . 

This  work  is  part  of  our  continuing  efforts  to  test  highly 
parallel  segments  of  alternative  algorithms  to  the 
displacement  method  in  order  to  obtain  a  better  understand 
of  the  possible  ramifications  of  applying  massive 
parallelism  to  structural  analysis  computations. 
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